Pesquisa | Portal Regional da BVS

1.

High-Throughput Protein Production Combined with High- Throughput SELEX Identifies an Extensive Atlas of Ciona robusta Transcription Factor DNA-Binding Specificities.

Nitta, Kazuhiro R; Vincentelli, Renaud; Jacox, Edwin; Cimino, Agnès; Ohtsuka, Yukio; Sobral, Daniel; Satou, Yutaka; Cambillau, Christian; Lemaire, Patrick.

Methods Mol Biol ; 2025: 487-517, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31267468

RESUMO

Transcription factors (TFs) control gene transcription, binding to specific DNA motifs located in cis-regulatory elements across the genome. The identification of TF-binding motifs is thus an important aspect to understand the role of TFs in gene regulation. SELEX, Systematic Evolution of Ligands by EXponential enrichment, is an efficient in vitro method, which can be used to determine the DNA-binding specificity of TFs. Thanks to the development of high-throughput (HT) DNA cloning system and protein production technology, the classical SELEX assay has be extended to high-throughput scale (HT-SELEX).We report here the detailed protocol for the cloning, production, and purification of 420 Ciona robusta DNA BD. 263 Ciona robusta TF DNA-binding domain proteins were purified in milligram quantities and analyzed by HT-SELEX. The identification of 139 recognition sequences generates an atlas of protein-DNA-binding specificities that is crucial for the understanding of the gene regulatory network (GRN) of Ciona robusta. Overall, our analysis suggests that the Ciona robusta repertoire of sequence-specific transcription factors comprises less than 500 genes. The protocols for high-throughput protein production and HT-SELEX described in this article for the study of Ciona robusta TF DNA-binding specificity are generic and have been successfully applied to a wide range of TFs from other species, including human, mouse, and Drosophila.

Assuntos

Ciona intestinalis/metabolismo , Animais , Ciona intestinalis/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Ligação Proteica , Técnica de Seleção de Aptâmeros/métodos , Análise de Sequência de DNA/métodos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo

2.

ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick.

Nucleic Acids Res ; 46(D1): D718-D725, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29149270

RESUMO

ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates.

Assuntos

Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Genoma , Urocordados/genética , Animais , Evolução Biológica , Ciona intestinalis/genética , DNA/metabolismo , Mineração de Dados , Evolução Molecular , Expressão Gênica , Ontologia Genética , Internet , Anotação de Sequência Molecular , Filogenia , Ligação Proteica , Especificidade da Espécie , Fatores de Transcrição/metabolismo , Transcrição Gênica , Vertebrados/genética , Navegador

3.

Resolution and reconciliation of non-binary gene trees with transfers, duplications and losses.

Jacox, Edwin; Weller, Mathias; Tannier, Eric; Scornavacca, Celine.

Bioinformatics ; 33(7): 980-987, 2017 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-28073758

RESUMO

Summary: Gene trees reconstructed from sequence alignments contain poorly supported branches when the phylogenetic signal in the sequences is insufficient to determine them all. When a species tree is available, the signal of gains and losses of genes can be used to correctly resolve the unsupported parts of the gene history. However finding a most parsimonious binary resolution of a non-binary tree obtained by contracting the unsupported branches is NP-hard if transfer events are considered as possible gene scale events, in addition to gene origination, duplication and loss. We propose an exact, parameterized algorithm to solve this problem in single-exponential time, where the parameter is the number of connected branches of the gene tree that show low support from the sequence alignment or, equivalently, the maximum number of children of any node of the gene tree once the low-support branches have been collapsed. This improves on the best known algorithm by an exponential factor. We propose a way to choose among optimal solutions based on the available information. We show the usability of this principle on several simulated and biological datasets. The results are comparable in quality to several other tested methods having similar goals, but our approach provides a lower running time and a guarantee that the produced solution is optimal. Availability and Implementation: Our algorithm has been integrated into the ecceTERA phylogeny package, available at http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and which can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera . Contact: celine.scornavacca@umontpellier.fr. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Duplicação Gênica , Genes , Filogenia , Algoritmos , Simulação por Computador , Cianobactérias/genética , Bases de Dados Genéticas , Evolução Molecular , Extinção Biológica , Variação Genética , Proteobactérias/genética

4.

ecceTERA: comprehensive gene tree-species tree reconciliation using parsimony.

Jacox, Edwin; Chauve, Cedric; Szöllosi, Gergely J; Ponty, Yann; Scornavacca, Celine.

Bioinformatics ; 32(13): 2056-8, 2016 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-27153713

RESUMO

UNLABELLED: : A gene tree-species tree reconciliation explains the evolution of a gene tree within the species tree given a model of gene-family evolution. We describe ecceTERA, a program that implements a generic parsimony reconciliation algorithm, which accounts for gene duplication, loss and transfer (DTL) as well as speciation, involving sampled and unsampled lineages, within undated, fully dated or partially dated species trees. The ecceTERA reconciliation model and algorithm generalize or improve upon most published DTL parsimony algorithms for binary species trees and binary gene trees. Moreover, ecceTERA can estimate accurate species-tree aware gene trees using amalgamation. AVAILABILITY AND IMPLEMENTATION: ecceTERA is freely available under http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera CONTACT: celine.scornavacca@umontpellier.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional/métodos , Evolução Molecular , Duplicação Gênica , Família Multigênica , Filogenia , Algoritmos , Modelos Teóricos

5.

SylvX: a viewer for phylogenetic tree reconciliations.

Chevenet, François; Doyon, Jean-Philippe; Scornavacca, Celine; Jacox, Edwin; Jousselin, Emmanuelle; Berry, Vincent.

Bioinformatics ; 32(4): 608-10, 2016 Feb 15.

Artigo em Inglês | MEDLINE | ID: mdl-26515823

RESUMO

MOTIVATION: Reconciliation methods aim at recovering the evolutionary processes that shaped the history of a given gene family including events such as duplications, transfers and losses by comparing the discrepancies between the topologies of the associated gene and species trees. These methods are also used in the framework of host/parasite studies to recover co-diversification scenarios including co-speciation events, host-switches and extinctions. These evolutionary processes can be graphically represented as nested trees. These interconnected graphs can be visually messy and hard to interpret, and despite the fact that reconciliations are increasingly used, there is a shortage of tools dedicated to their graphical management. Here we present SylvX, a reconciliation viewer which implements classical phylogenetic graphic operators (swapping, highlighting, etc.) and new methods to ease interpretation and comparison of reconciliations (multiple maps, moving, shrinking sub-reconciliations). AVAILABILITY AND IMPLEMENTATION: SylvX is an open source, cross-platform, standalone editor available for Windows and Unix-like systems including OSX. It is publicly available at www.sylvx.org.

Assuntos

Filogenia , Software , Evolução Molecular

6.

A fast method for calculating reliable event supports in tree reconciliations via Pareto optimality.

To, Thu-Hien; Jacox, Edwin; Ranwez, Vincent; Scornavacca, Celine.

BMC Bioinformatics ; 16: 384, 2015 Nov 14.

Artigo em Inglês | MEDLINE | ID: mdl-26573665

RESUMO

BACKGROUND: Given a gene and a species tree, reconciliation methods attempt to retrieve the macro-evolutionary events that best explain the discrepancies between the two tree topologies. The DTL parsimonious approach searches for a most parsimonious reconciliation between a gene tree and a (dated) species tree, considering four possible macro-evolutionary events (speciation, duplication, transfer, and loss) with specific costs. Unfortunately, many events are erroneously predicted due to errors in the input trees, inappropriate input cost values or because of the existence of several equally parsimonious scenarios. It is thus crucial to provide a measure of the reliability for predicted events. It has been recently proposed that the reliability of an event can be estimated via its frequency in the set of most parsimonious reconciliations obtained using a variety of reasonable input cost vectors. To compute such a support, a straightforward but time-consuming approach is to generate the costs slightly departing from the original ones, independently compute the set of all most parsimonious reconciliations for each vector, and combine these sets a posteriori. Another proposed approach uses Pareto-optimality to partition cost values into regions which induce reconciliations with the same number of DTL events. The support of an event is then defined as its frequency in the set of regions. However, often, the number of regions is not large enough to provide reliable supports. RESULTS: We present here a method to compute efficiently event supports via a polynomial-sized graph, which can represent all reconciliations for several different costs. Moreover, two methods are proposed to take into account alternative input costs: either explicitly providing an input cost range or allowing a tolerance for the over cost of a reconciliation. Our methods are faster than the region based method, substantially faster than the sampling-costs approach, and have a higher event-prediction accuracy on simulated data. CONCLUSIONS: We propose a new approach to improve the accuracy of event supports for parsimonious reconciliation methods to account for uncertainty in the input costs. Furthermore, because of their speed, our methods can be used on large gene families. Our algorithms are implemented in the ecceTERA program, freely available from http://mbb.univ-montp2.fr/MBB/.

Assuntos

Evolução Molecular , Filogenia , Proteobactérias/genética , Algoritmos , Simulação por Computador , Genes Bacterianos , Reprodutibilidade dos Testes

7.

Joint amalgamation of most parsimonious reconciled gene trees.

Scornavacca, Celine; Jacox, Edwin; Szöllosi, Gergely J.

Bioinformatics ; 31(6): 841-8, 2015 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-25380957

RESUMO

MOTIVATION: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods-generally computationally more efficient-require a prior estimate of parameters and of the statistical support. RESULTS: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events.

Assuntos

Algoritmos , Cianobactérias/genética , Evolução Molecular , Genoma Bacteriano , Filogenia , Simulação por Computador , Cianobactérias/classificação , Duplicação Gênica , Família Multigênica

8.

Tissue-specific and ubiquitous expression patterns from alternative promoters of human genes.

Jacox, Edwin; Gotea, Valer; Ovcharenko, Ivan; Elnitski, Laura.

PLoS One ; 5(8): e12274, 2010 Aug 18.

Artigo em Inglês | MEDLINE | ID: mdl-20806066

RESUMO

BACKGROUND: Transcriptome diversity provides the key to cellular identity. One important contribution to expression diversity is the use of alternative promoters, which creates mRNA isoforms by expanding the choice of transcription initiation sites of a gene. The proximity of the basal promoter to the transcription initiation site enables prediction of a promoter's location based on the gene annotations. We show that annotation of alternative promoters regulating expression of transcripts with distinct first exons enables a novel methodology to quantify expression levels and tissue specificity of mRNA isoforms. PRINCIPAL FINDINGS: The use of distinct alternative first exons in 3,296 genes was examined using exon-microarray data from 11 human tissues. Comparing two transcripts from each gene we found that the activity of alternative promoters (i.e., P1 and P2) was not correlated through tissue specificity or level of expression. Furthermore neither P1 nor P2 conferred any bias for tissue-specific or ubiquitous expression. Genes associated with specific diseases produced transcripts whose limited expression patterns were consistent with the tissue affected in disease. Notably, genes that were historically designated as tissue-specific or housekeeping had alternative isoforms that showed differential expression. Furthermore, only a small number of alternative promoters showed expression exclusive to a single tissue indicating that "tissue preference" provides a better description of promoter activity than tissue specificity. When compared to gene expression data in public databases, as few as 22% of the genes had detailed information for more than one isoform, whereas the remainder collapsed the expression patterns from individual transcripts into one profile. CONCLUSIONS: We describe a computational pipeline that uses microarray data to assess the level of expression and breadth of tissue profiles for transcripts with distinct first exons regulated by alternative promoters. We conclude that alternative promoters provide individualized regulation that is confirmed through expression levels, tissue preference and chromatin modifications. Although the selective use of alternative promoters often goes uncharacterized in gene expression analyses, transcripts produced in this manner make unique contributions to the cell that requires further exploration.

Assuntos

Perfilação da Expressão Gênica , Regiões Promotoras Genéticas/genética , Biologia Computacional , Bases de Dados Genéticas , Doença/genética , Entropia , Epigênese Genética/genética , Éxons/genética , Genômica , Fator 4 Nuclear de Hepatócito/genética , Humanos , Mutação , Hibridização de Ácido Nucleico , Análise de Sequência com Séries de Oligonucleotídeos , Especificidade de Órgãos , Fenótipo , RNA Mensageiro/genética

9.

WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures.

Lichtenberg, Jens; Kurz, Kyle; Liang, Xiaoyu; Al-ouran, Rami; Neiman, Lev; Nau, Lee J; Welch, Joshua D; Jacox, Edwin; Bitterman, Thomas; Ecker, Klaus; Elnitski, Laura; Drews, Frank; Lee, Stephen Sauchi; Welch, Lonnie R.

BMC Bioinformatics ; 11 Suppl 12: S6, 2010 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-21210985

RESUMO

BACKGROUND: An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. METHODS: This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. RESULTS: A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. CONCLUSION: WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.

Assuntos

DNA/química , Genômica/métodos , Sequências Reguladoras de Ácido Nucleico , Software , Algoritmos , Arabidopsis/genética , Genoma de Planta , Cadeias de Markov , Análise de Sequência de DNA

10.

Word-based characterization of promoters involved in human DNA repair pathways.

Lichtenberg, Jens; Jacox, Edwin; Welch, Joshua D; Kurz, Kyle; Liang, Xiaoyu; Yang, Mary Qu; Drews, Frank; Ecker, Klaus; Lee, Stephen S; Elnitski, Laura; Welch, Lonnie R.

BMC Genomics ; 10 Suppl 1: S18, 2009 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-19594877

RESUMO

BACKGROUND: DNA repair genes provide an important contribution towards the surveillance and repair of DNA damage. These genes produce a large network of interacting proteins whose mRNA expression is likely to be regulated by similar regulatory factors. Full characterization of promoters of DNA repair genes and the similarities among them will more fully elucidate the regulatory networks that activate or inhibit their expression. To address this goal, the authors introduce a technique to find regulatory genomic signatures, which represents a specific application of the genomic signature methodology to classify DNA sequences as putative functional elements within a single organism. RESULTS: The effectiveness of the regulatory genomic signatures is demonstrated via analysis of promoter sequences for genes in DNA repair pathways of humans. The promoters are divided into two classes, the bidirectional promoters and the unidirectional promoters, and distinct genomic signatures are calculated for each class. The genomic signatures include statistically overrepresented words, word clusters, and co-occurring words. The robustness of this method is confirmed by the ability to identify sequences that exist as motifs in TRANSFAC and JASPAR databases, and in overlap with verified binding sites in this set of promoter regions. CONCLUSION: The word-based signatures are shown to be effective by finding occurrences of known regulatory sites. Moreover, the signatures of the bidirectional and unidirectional promoters of human DNA repair pathways are clearly distinct, exhibiting virtually no overlap. In addition to providing an effective characterization method for related DNA sequences, the signatures elucidate putative regulatory aspects of DNA repair pathways, which are notably under-characterized.

Assuntos

Biologia Computacional/métodos , Reparo do DNA , Regiões Promotoras Genéticas , Composição de Bases , Análise por Conglomerados , Bases de Dados Genéticas , Humanos , Modelos Estatísticos

11.

Finding Occurrences of Relevant Functional Elements in Genomic Signatures.

Jacox, Edwin; Elnitski, Laura.

Int J Comput Sci ; 2(5): 599-606, 2008 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-20046539

RESUMO

For genomic applications, signature-finding algorithms identify over-represented signatures (words) in collections of DNA sequences. The results can be presented as a specific sequence of bases, a consensus sequence showing possible combination of bases, or a matrix of weighted possibilities at each position. These results are often compared to a biological set of binding sites (i.e., known functional elements), which are usually represented as weighted matrices. The comparison is made by scoring the signatures against each weight matrix to identify the best option for a positive hit. However, this approach can misclassify results when applied to short sequences, which are a frequent result of signature finders. We describe a novel method using a window around the original sequences (those which the signature is based upon) to improve the comparison and identify a more significant measure of similarity. In doing so, our method transforms a list of DNA signatures into a resource of characterized binding sites with known functional roles and identifies novel elements in need of further elucidation.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA